Practical Methods for Ab Initio Calculations on Thousands of Atoms 
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I. INTRODUCTION 

Over the last thirty years, the use of ab initio electronic 
structure techniques has become widespread in chemistry 
and physics. However, all traditional techniques are lim- 
ited in the system size which they can treat (often de- 
fined by the number of atoms, N) by poor scaling of 
the computational effort required, which is generally at 
least 0{N'^), if not worse. In the context of standard self- 
consistent field (SCF) quantum chemical methods such as 
Hartree-Fock theory or density functional theory (DFT), 
there are two demanding parts of a calculation: first, 
the build of the Fock matrix (or, in DFT, the Hamil- 
tonian matrix), which can scale as 0{N'^); second, the 
solution for eigenvectors of the Fock matrix, which scales 
as 0{N^) if performed as a diagonalisation; this only 
dominates for large values of N. An alternative to di- 
agonalisation is iterative minimisation, which has 0{N'^) 
scaling (one N dependence comes from the eigenvectors 
spreading over all space, and the other from the num- 
ber of eigenvectors, which depends on TV); if, as is of- 
ten the case, the eigenvectors must be orthogonalised to 
each other, this leads asymptotically to 0{N^) scaling. 
Whatever the cause of poor scaling, it results in a prac- 
tical limit of a few hundred atoms in conventional ab 
initio techniques. The desire to model large systems, e.g. 
biomolecules or nanostructures, has seen a strong push 
in recent years to achieve linear scaling with system size. 

The building of the Fock matrix (and specifically the 
Coulomb and exchange terms which are the most expen- 
sive) with linftarp^caling has been addressed recently by 
several groupsld^El; however, this aspect will not concern 
us here. Rather, we will consider linear scaling techniques 
(which also rely on iterative minimisation) for finding the 
self-consistent ground state of the system. 

Recently, many linear scaling techniques have been 
proposed, Y^'M^ ^^^ ^^^ based on the search for the den- 
sity matrixQcJ. They start from the observation that 



the density matrix between two points in space decays in 
some manner as those two points increase in separation. 
This is intuitively clear, as it is well-known that bonding 
is local. The result is that the electronic structure of an 
atom depends only on its local environment, so that if 
the overall size of the system changes, there is no effect 
on the local electronic structure; thus the effort required 
to solve for the whole system should be proportional to 
N. This is the foundation of all linear scalmg electronic 
structure methods (with a few exceptionsOEEl which will 
not concern us here). 

The paper is divided up as follows: Section fl describes 
the basic theory behind our 0{N) DFT method, while 
Section [II presents recent advances we have made in 



searching for the electronic ground state. The future di- 
rections the work will take are shown in Section IV , and 
the paper is concluded in Section ^ 



II. 0{N) DENSITY FUNCTIONAL THEORY 

The density matrix within density functional theory 
(DFT) can be written as: 



p(r,r') 



J2f^Mrm{r'), 



(1) 



where ipi is a Kohn-Sham eigenfunction, and fi is the 
occupancy of that eigenfunction. The key observation 
which underpins 0{N) DFT is that DFT can be for- 
mulated in terms of p(r,r'), and that the ground state 
can be found by minimising the total energy, -Etot, with 
respect to p(r, r') subject to the condition that p(rpii') 
is idempotent (these statements are proved elsewheretj) . 
Idempotency means that p- p = p, which is equivalent to 
the eigenvalues of p (which are the occupation numbers 
fi) being either zero or one. This is a crucial property 
for the density matrix, as it is ensures that the density 



matrix is a projector - it is the operator which projects 
onto the space of occupied states. 

Another important property of the density matrix is 
that it decays as the separation between points increases: 



P(r, 



as I r - r' 



CX). 



(2) 



The fundamental reason for this decay is the loss of quan- 
tum phase coherence between distant points. The net re- 
sult is that the local environment is all that is important 
in determining p(r, r'), and thus that the amount of infor- 
mation contained in p is proportional to N . This decay 
property of the density matrix can be used to make the 
amount of information in the system strictly linear with 
the system size by imposing a constraint, and setting the 
density matrix to zero beyond a cutoff: 



p(r,r') = 0,|r-r'|>i?e 



(3) 



where Re is some cut-off radius. Solving for the energy 
with this constraint imposed, and that of idempotency, 
will lead to an upper bound on the ground state energy^ ; 
as Re is increased, it will converge to the true ground 
state. Clearly there is a balance to be struck between ac- 
curacy, which increases as Re is increased, and the com- 
plexity of the computational problem (e.g. number of 
variational degrees of freedom associated with p) , which 
also increases with Re- 

This exact formulation cannot be followed for practi- 
cal methods, as p is dependent on two vector positions. 
Instead, a further approximation is introduced, namely 
that p be separable, so that it can be written in the form: 



p(r,r') 



E 



4'ia{r)KiaJi34>jf3{r'), 



which is equivalent to requiring that p only have a finite 
number of non-zero eigenvalues. The functions (jjia (r) are 
known as 'localised orbitals', where i runs over atoms and 
a over localised orbitals on each atom. The matrix Ktajp 
is the density matrix in the basis set of {(j)ia{r)} (and is 
identical to the density matrix commonly seen in 0{N) 
tight binding schemes; indeed many of these schemes can 
be used to solve for this matrix within 0{N) DFT). 

The localisation of p is accomplished by setting the 
localised orbitals to be non-zero only inside a certain ra- 
dius, i?rog, centred on the atoms i. A spatial cutoff (not 
generally of the same value) must also be imposed on the 
matrix K, so that Kia^jp — once atoms i and j are 
more than a certain distance apart. 



A. A Specific Implementation: CONQUEST 



The framework described above is completely general; 
we will no w cpacc afa'ate on our specific implementation, 
ConquestMeIO^EZI. This is based on the pseudopoten- 
tial approach to DFT (described briefly in Appendix ^), 
and has been constructed so as to be as accurate as con- 
ventional ab initio pseudopotential calculations, which 
use plane waves as a basis set. 

In practice, there are various issues which must be 
addressed: minimising the total energy with respect to 
Kia,ji3 while maintaining idempotency and spatial local- 
isation; representing the localised orbitals; the cutoffs re- 
quired to achieve good convergence to the true ground 
state; and practical questions, including integration and 
implementation on parallel comp uters. These have been 
addressed in detail elsewhereEaO; we present a brief sum- 
mary here. 

The imposition of idempotency on K during the min- 
imisation of ii^tot with respect to Kiajp is the hard- 
est constraint to maintain. There are several proposed 
means of accomplishing this; the method described-|here 
is based on the purification technique of McWeenynS, re- 
cently usedjin tight binding calculations by Li, N unes and 
VanderbiltQ, and described in detail in Section [II A. It 



requires K to be written in terms of an 'auxiliary' density 
matrix, L: 



K = iLSL - 2LSLSL, 



(4) where S is the overlap matrix: 



(5) 



(6) 



The localisation of K is then imposed as a spatial cutoff 
on L: 



SiajP = / AY(l)ia{Y)(l)ji3{r). 



''ia^jp 



= 0, 



R; — R. j |> ^L, 



(7) 



where R^ is the position of atom i and -Rl is a cutoff 
radius. The energy is then minimised with respect to 
the matrix element&-iiQ,j;3 using the standard conjugate 
gradients techniqueE£l. The effect of the purification is 
to make K more nearly idempotent given an L which is 
nearly idempotent. 



^The idempotency constraint on p will make it the projec- 
tion operator onto the occupied states. The energy of the 
ground state given by these states will be the energy for the 
system under the constraint of a localised p. Since DFT is 
variational, this extra constraint will raise the energy above 
the true ground state energy (i.e. without localisation), and 
so will give an upper bound to the true ground state energy. 



The next issue to consider is the representation of the 
locaUsed orbitals. The lessons learnt from the use of 
plane waves in conventional pseudopotential calculations 
are helpful to remember here. First, they offer a system- 
atic convergence of the energy with respect to the basis 
set completeness, and this is achieved with a single pa- 
rameter (the cutoff energy). Second, they are bias free - 
that is, they are completely flexible, and no knowledge of 
the kind of bonding in the system is required. If possible, 
the choice of basis set would reflect these qualities. 

Conquest represents the localised orbitals in a real- 
space basis. There are various possibilities for an effi- 
cient, real-space basis. One is to use the spherical equiva- 
lent of plane waves, that is spherical Bessel functions ji (r) 
combined with spherical harmonics Yj^{r), within each 
localisation region. This representation has been dis- 
cussed by Haynes and PayneEJ, but practical results haKe 
not yet been reported. Another 0{N) DFT schemecll 
uses pseudo-atomic orbitalstd as the basis functions, with 
considerable success. An alternative is to represent the 
4'ia by their values on a grid, and to calculate matrix el- 
ements of the kinetic energy by taking finite differences. 
This technique is well—established in conventional first 
principles calculationsE3, and haSpbeen investigated in the 
context of ©(AL) techniques by usli3 and recently by Hoshi 
and Fujiwarao. At present we use a basis of B-splines, 
0(r), (also called blip functions) in the expansion: 



0ia(r) 



E' 



R-is), 



(8) 



where the B-splines are piecewise polynomial functions 
(continuous up to the third derivative) which are strictly 
localised on the points of a grid (notated as R^s above) 
which is rigidly attached to each atom (known as the blip 
grid). The energy is then minimised with respect to the 
coefficients of the B-splines, bias- 

Having described the minimisation of the total energy 
with respect to both the K matrix and the localised or- 
bitals, it is now appropriate to consider the practical 
performance of the method: are the cutoffs required to 
achieve convergence to the ground state small enough to 
be practical ? Tests on a model, local pseudopotential 
and standardj-Jiaii-local pseudopotentials have already 
been reportedc2l'E3, and show that for reasonable cutoffs, 
good convergence is obtained. We reproduce some of 
these results in Figure |l|. Fig. Il|(a) shows the calculated 
total energy as a function of i?reg for Si. The results 
show that -Etot converges to the correct value extremely 
rapidly once -Rrog is greater than 4 A. For this radius, 
each localisation region contains 17 neighbouring atoms, 
and the calculations are perfectly manageable. Fig. |l](b) 
shows the total energy for R-ccg — 2.715 A, as a func- 
tion of Rl ■ Rather accurate convergence to the Ri, — cg 
value is obtained for i?L > 8 A, which again is accept- 
able. No value is shown for exact diagonalisation because 
of technical difficulties in performing comparisons. 

To perform integrations such as Siajp = 
/ AY(f>ia{Y)(f>jp{Y) numerical integration on a grid is used. 



This integration grid is generally of different spacing to 
the blip grid (and normally about twice as fine). Most 
matrix elements are found by integration on this grid 
(with the exception of fast-varying quantities which are 
calculated analytically), and the localised orbitals are 
projected from the blip grid to the integration grid in 
a manner similar to a fast Fourier transform (FFT), 
called a blip-to-grid transform. Once the charge density 
is calculated on the grid (as n(r) = p(r,r)), the Hartree 
potential and energy are found using FFTs. 

Conquest has been designed with pargdkl computers 
in mind; here we summarize the strategyE3. Each pro- 
cessor has three responsibilities. First, a group of atoms: 
it holds the blip coefficients, bias and their derivatives of 
the energy, dEtot/dbtas and is responsible for performing 
the blip-to-grid transforms for these atoms. It also stores 
the rows of matrices corresponding to these atoms and 
performs the matrix multipications for these rows. Sec- 
ond, a domain of integration grid points: it has respon- 
sibility for calculating contributions to matrix elements 
arising from sums over these points, and for holding the 
electron density and the Kohn-Sham potential on these 
points. Third, part of the spatial FFT for the Hartree 
potential: it deals with a set of columns in the x,y ot 
z directions. All processors switch between tasks in a 
concerted manner. 

To test the efficiency of the scheme, we have extensively 
tested its scaling properties. There are two completely 
different kinds of scaling: parallel scaling (i.e. scaling 
of CPU time for a given system with varying numbers 
of processors); and inherent scaling (i.e. scaling of CPU 
time for a fixed number of processors as the system size 
varies. In the present implementation of Conquest, 
both types of scaling are excellentEj. 

The overall Conquest scheme can be summarised as 
follows: the ground state energy and density matrix of 
the system are found by minimising the energy Etot with 
respect to the elements of the auxiliary density matrix, 
Lia,ji3, and the locahsed orbitals, (f)ia[r), subject to the 
spatial cutoffs Rl and i?rcg- This yields an upper bound 
to the true ground state, which improves as the cutoffs 
are increased. 



III. STRATEGIES FOR REACHING THE 
GROUND STATE 



Having described the specific manner in which Con- 
quest is implemented, we now consider ways of reaching 
the ground state efficiently and robustly. At present, the 
minimisation is carried out in three separate stages: first, 
the ground state density matrix is found for a given set 
of localised orbitals; second, self-consistency is achieved 
between the charge density and the potential (which in- 
cludes further density matrix minimisation in response 
to the new potential); finally, the form of the localised 
orbitals is changed in accordance with the gradient of 



the energy. The inner loops (density matrix minimisa- 
tion and self-consistency) are then repeated. Schemes 
for efficiency and robustness for these three stages are 
now discussed. 



A. Density matrix minimisation 

As has already been emphasised, the density matrix 
of the true electronic ground state is idempotent. This 
important property is hard to impose during a minimi- 
sation; however, a number of schemes which drive the 
matrix towards idempotency have been suggested. For 
simplicity, we will consider these schemes in the frame- 
work of orthogonal tight binding theory; the extension 
to the non-orthogonal case and DFT is sipiple enough. 
The first scheme was proposed by McWeenjfiS, who noted 
that if a matrix p is close to idempotency, then the matrix 
p given by: 



3p2 - 2p3 



(9) 



will be more nearly idempotent. It has the effect of driv- 
ing the eigenvalues of p towards zero and one (this can 
be seen by considering the function 3A^ — 2A'^, which is 
shown in Figure ||). If this transformation (often called 
the McWeeny transformation or purification transforma- 
tion) is applied iteratively (writing pn+i = 3/9^^ — 2p^ 
for iteration n -\- 1), then the sequence of matrices gen- 
erated will converge on an idempotent matrix. In fact, 
this transformation is quadratically convergent (i.e. if the 
idempotency error in p is Sp, then the idempotency error 
in p is (5p2). 

Falser and ManolopoulosE3 have recently suggested us- 
ing this iterative scheme in an 0{N) manner. They point 
out that if the initial density matrix is a linear function of 
the Hamiltonian, with eigenvalues between zero and one, 
then the iteration will converge to the correct ground 
state density matrix (given by 9{p — H), where 9{x) is 
the Heaviside step function {6=1 for a; > and = for 
x < 0) and p is the chemical potential for electrons, or 
the Fermi energy), and that the energy, E = 2Tr[p„i/], 
will decrease monotonically at each step. This procedure 
has the advantage of being fast (it only requires two ma- 
trix multiplies per iteration) and efficient (it converges 
quadratically). Unfortunately, when a localisation crite- 
rion (also called a truncation) is applied to the density 
matrix to achieve linear scaling, the monotonic decrease 
of energy will fail at some point in the iterative search. 
This can be taken as an indication that truncation er- 
rors are dominatin£_the calculation, and that the search 
should be stoppedEEl; indeed, if it is not stopped, there 
is no guarantee that it will continue to converge towards 
an idempotent matrix. This is a heuristic criterion for 
stopping the iteration, and has the drawback that the 
method will not be variational, so that analytic forces 
will not be in agreement with the numerical gradient of 
the energy. 



The Li, Nunes and Vanderbilt (LNVJU ^scheme for 
achieving the ground state density matrixQ'L^ also uses 
the McWeeny transformation, though in a rather differ- 
ent manner. Here, the energy is written as E = 2Tt[pH], 
with p given by equation M. Then the energy is minimised 
with respect to the elements .of p, typically using a scheme 
such as conjugate gradientsEj to generate a sequence of 
search directions. The localisation of the density matrix 
is achieved by applying a spatial cutoff to the elements 
of p. This scheme has at least two advantages: first, each 
line minimisation can be performed analytically, as the 
energy is cubic in p; second, it is variational, so that the 
energy found is always an upper bound to the ground 
state, and forces obey the Hellmann-Feynman theorem 
and are in exact agreement with the numerical deriva- 
tive of the energy. 

However, there are drawbacks to the LNV technique. 
First, it is unclear what the best initial value should be for 
the density matrix; typically, it is taken to be ^I, or ^S"^ 
in a non-orthogonal basis set. Second, as the McWeeny 
transformation is a cubic, it is unbounded from below, 
and a poor starting choice for the minimisation can lead 
to runaway solutions; a sign of this is typically that the 
cubic has complex extrema. Third, the scheme can be 
poorly convergent, and is not guaranteed the quadratic 
convergence of the McWeeny method. 

We have recently proposed a hybrid between the 
McWeeny and LNV schemes which buii^ on the comple- 
mentarity between these two methodaSZl. It is based on 
the observation that the sequence of matrices generated 
during a McWeeny iterative search converges quadrat- 
ically towards idempotency, and that the LNV search 
direction maintains idempotency in the density matrix 
to first order. Thus the McWeeny scheme is used as an 
initialisation to find an idempotent density matrix (but 
one which is not the ground state matrix because of trun- 
cation errors); this matrix is then used as the input to 
the LNV scheme, which maintains the idempotency while 
searching efficiently for the ground state density matrix. 
The combination of the two methods is both variational 
and robust - two highly desirable attributes for the inner 
loop of an 0{N) DFT method. 

As an example of the improved speed of convergence 
given by the hybrid scheme. Figure ^ shows the con- 
vergence to the ground state energy in diamond carbon 
for the LNV stage of the hybrid scheme and pure LNV 
(initialised from p = ^I). These results show that the 
McWeeny stage of the hybrid scheme gets closer to the 
ground state as the radius is increased, as expected, and 
that it acts as an excellent initial density matrix for the 
LNV scheme. The method has also been tested on a va- 
cancy in diamond C_ the Si(OOl) surface and liquid Si, as 
reported elsewhere^]. 



B. Non-orthogonality 



R[r] = p[p" 



(12) 



The previous section focuses on the traditional tight 
binding scheme where the locahsed orbitals are taken 
to be orthogonal. However, in our 0{N) method, the 
orbitals are non-orthogonal, and this introduces a sig- 
nificant degree of complication. In strict mathematical 
terms, the metric for the space spanned by the localised 
orbitals must be chosen with care; this means defining 
a scalar product in a specific way, as explained in detail 
in Appendix pi along with other implications of using 
non-orthogonal orbitals. We will consider a few simple 
implications of this theory here. 

For any given set of non-orthogonal orbitals, {(t>i}, an 
orthogonal set can be defined by using the overlap ma- 
trix, S: 



EiS- 



1/2V 



(10) 



where we use an over-bar to indicate the quantities in 
the non-orthogonal case. If the metric is chosen suitably, 
then similar transformations between the Hamiltonian 
and density matrices in the two spaces can be defined: 






(11) 



There are various points which can be drawn from the 
above equations. First, the McWeeny transformation 
(and the other quantities associated with a minimisa- 
tion such as the gradient of the energy) will change in 
the new basis set; in fact the McWeeny transformation 
becomes p — 3pSp ~ 2pSpSp. Second, there are differ- 
ent types of matrix in the non-orthogonal case, one of 
which transforms with S^^^ and the other with S~^^^ (in 
fact there are two types of orbital and hence four types 
of matrix); great care must be taken to combine these 
matrices in the correct fashion, as pointed out by White 
et alE3 for the case of the gradient of the energy with 
the non-orthogonal density matrix. Third, there may be 
a considerable advantage in choosing the metric so that 
the transformations shown in equation |ll| apply, as this 
will enable direct comparison with the orthogonal case. 
The interested reader is referred to Appendix H and ref- 
erences therein for more details. 



where p[p™] is the output charge density: that is, the po- 
tential arising from pin is found (consisting of Hartree and 
exchange-correlation parts), the Schrodinger equation is 
solved, and the output charge density generated from the 
resultant wavefunctions. Clearly, the aim is to reduce 
R[p™] to as close to zero as possible in the least number 
of iterations. The simplest possible technique is to use 
the output charge density from one cycle (p°"* — p[Pn]) 
as the input for the next cycle: p^+i = Pn"*i however, this 
is potentially rather slow, and prone to the phenomenon 
known as 'charge sloshing', where long wavelength vari- 
ations of the charge in the unit cell dominate the self- 
consistency procedure. There are in fact cases where this 
simple method fails to work at all, and self-consistency 
is never reached. This is clearly unacceptable. 

Better than this is to perform a linear mix of the two 
previous charge densities, so that: 



'■"n+l 



(i-A)p™\+Apr 



(13) 



The value of lambda can be found easily. If the residu- 
als (defined above in equation |l^) are treated as vectors 
(considering the value at each spatial position as an en- 
try in the vector), then scalar products can be formed 
between residuals, and the norm of a residual can be 

1 /9 

found as {{R[Pn+i] \ R[Pn+i])) ■ The optimum value 
ot^ is found by minimising this norm with respect to 

aeI. 



This gives: 



{R[p^] I R[p^] - J?[p'f_i]) 

{R[pif] - R[pif_,] I i?[pjf] - i?[pri]) ■ 



(14) 



This procedure can be generalised to more than two pre- 
vious densities, wiiidican give significant benefits, as de- 
scribed elsewher(£3'Lj. It is often important to mix in 
a small amount of the input charge densities, as well as 
performing the mixing given above. 

As well as combining charge densities in the optimum 
manner, it is important to suppress the phenomenon of 
'charge sloshinglf-an ideal way to do this is to use Kerker 
preconditioningcS. Here a scaling is applied in reciprocal 
space to the residual: 



C. Charge mixing and self-consistency 

The question of achieving self-consistency between the 
charge density and the potential has been examined in 
great detail over many jpars, and much is known about 
efficient implementationEl. Within Conquest, the di- 
rect invecsion of the iterative subspace (DIIS) method 
of PulaycJ is used. At each iteration, a residual can be 
defined as: 



i?[pj>] = R\(^] 



q- 



1 ' 

% 



(15) 



where q is a reciprocal space vector, and q^ is chosen 
suitably (a value close to 2-k ja^^ where oq is a lattice 
vector, is appropriate). This scaling is an approximation 
to the inverse dielectric function, and enables fast and 
robust iteration to a self-consistent charge density and 
potential. 



D. Pre-conditioning localised orbital variation 

Now that we have described the robust and efficient 
search for the ground state density matrix (for a given 
set of localised orbitals) and the fast iteration to a self- 
consistent charge density and potential, we must consider 
the variation of the localised orbitals. 

As is the case for minimisation problems in many ar- 
eas of science, Conquest suffers from ill conditioning 
in the search for the ground state when varying the lo- 
calised orbitals. Ill conditioning occurs if the function 
being minimised has a wide range of curvatures. For a 
general function f{xi,X2,---,XN), dependent upon vari- 
ables {xi}, the curvature matrix (or Hessian) can be de- 
fined as Cij = d'^f/dxidxj. If the eigenvalues A„ of 
Cij span a wide range, then the surfaces of constant 
/ are elongated (illustrated in Figure 0), and. conven- 
tional techniques such as conjugate gradientsEj will be- 
come very inefficient. It is known that the number of iter- 
ations required by conjugate gradients is proportional to 
(Amax/Amin)^^^, whcre Amax and-Amin are the maximum 
and minimum eigenvalues of CijE3. 

While ill conditioning is a widespread problem, the so- 
lution depends on the specific situation. Conventional 
first principles calculations which use plane waves as a 
basis set have been xepagnised for many years to suffer 
from ill conditioningCJ'E3, and it turns out that ill condi- 
tioning found in 0{N) calculations is closely related to 
this; we will review the plane wave ill conditioning before 
describing the 0{N) ill conditioning. 



The plane wave energy functional (eq. A4) has large 
curvatures associated with high wavevector G, because 
of the form of the kinetic energy: 



Ev 



^Ef^E 



G 



2m 



CiG 



(16) 



so that the energy is proportional to G^. This first type 
of ill conditioning is easily cured (essentially by scaling 
CiG by a factor of (1 -I- G^/Gq)"^/^), and is referred to 
as 'length scale ill conditioning', as it comes from the 
variation of curvature with length scale. 

Another type of ill conditioning seen in conventional 
techniques is associated with the invariance of Etot under 
a unitary transformation of the orbitals. If the occupa- 
tion numbers fi are all either zero or one, then i^tot is 
exactly invariant under transformations such as: 



v-z = Yl ^'j'^i 



(17) 



where Uij is a unitary matrix. If the occupancies deviate 
slightly from zero or one, however, the exactness of the in- 
variance is broken, and the energy changes slightly. Some 
of the eigenvalues of the Hessian will go from exactly zero 
(under the exact transformation) to very small, leading 
to poor conditioning. We shall refer to this as 'superpo- 
sition ill conditioning'. In conventional techniques this is 



cured by performing a rotation of the wave functions so 
that the Hamiltonian becomes diagonal in the subspace 
spanned by the occupied states. 

A final type of ill conditioning found in conventional 
methods arises with variable occupation numbers, and is 
associated with eigenvalues whose energies are well above 
the Fermi energy, which will have very small occupation 
numbers. Variations of the ipi associated with these small 
occupation numbers will have little effect on the value of 
-Etot, and lead to small values of the curvature. Since 
the variations of these eigenvalues are almost redundant 
in the minimisation, we refer to this as 'redundancy ill 
conditioning'. 

All three of these forms can cause difficulties within 
0{N) techniques, though typically the specific solution 
will vary. For instance, it is clear that variations of the 
localised orbitals, {<t>ia], will have different length scales, 
and will suffer from length scale ill conditioning. This is 
easily cured in the same way as for plane waves, as has 
been recently demonstatede^, though the methodology is 
somewhat different. As a demonstration of the efficacy of 
this preconditioning, Figure shows the convergence of 
Conquest with and without length scale precondition- 
ing for three different region radii for the localised or- 
bitals. Clearly this problem becomes significantly worse 
for larger regions. 

Superposition ill conditioning is associated with the 
linear mixing of localised orbitals. It is easily shown that 
linear mixing of the functions on the same atom leaves 
-Etot unchanged, and so will not cause ill conditioning. 
Variations of the localised orbital 4)i„ such as: 






(18) 



are rather more interesting. Strictly, these are not pos- 
sible, as the localised orbitals are constrained to be zero 
outside their localisation regions. However, once the re- 
gion radii become large, there will be variations which 
almost fulfil this criterion. It is the small eigenvalues of 
the Hessian of Etot associated with this almost perfect 
mixing which will cause superposition ill conditioning. It 
is perfectly possible to cure this, however, as the form of 
the variations can be written down. We have developed a 
method to precondition these variations, and are testing 
it. It will be described in a future publication. 

Finally, we come to redundancy ill conditioning. Just 
as in conventional calculations this occurs when the occu- 
pation numbers fi are very small, this may occur in 0{N) 
techniques when the number of localised orbitals (pia is 
more than half the number of electrons. It is desirable, if 
not essential, to be able to work with an extended num- 
ber of orbitals; for instance, in group IV elements, the 
natural basis will consist of four orbitals, roughly cor- 
responding to one s and three p_qrbitals. (It is worth 
noting that Kim, Mauri and GallicZl have found that it is 
essential to have more orbitals than filled bands to avoid 
local minima in the energy functional in a related 0{N) 



scheme.) As before, we believe that this form of ill condi- 
tioning can be removed by suitable preconditioning, but 
detailed techniques have yet to be formulated. 



IV. FUTURE DIRECTIONS 



Having summarised the techniques involved in Con- 
quest, it is appropriate to look to the near future, and 
consider the directions in which the project is going. 



A. Forces 



The issue of forces is a key one for any electronic struc- 
ture technique; if relaxation of ions or molecular dy- 
namics are to be performed then the analytical forces 
must agree with the gradient of the energy. Con- 
quest has been specifically constructed so that, provided 
small Pulay-type correctionsC3 are included, the forces 
are guaranteed to be consistent with the energy. Pulay 
corrections are required because the B-spline basis func- 
tions move with the atoms, and are easily found, as will 
be described elsewhere. This means that the relaxation 
of the system to mechanical equilibrium and the genera- 
tion of time-dependent ionic trajectories will be feasible 
in 0{N) DFT calculations. 



B. Efficient choices for representation of localised 
orbitals 



C. Finding the density matrix for metals 



The question of modelling metals is much harder than 
insulators or semiconductors for the 0{N) methods de- 
scribed above, simply because the density matrix is more 
delocalised in metals, meaning that the cutoff applied to 
L (and hence to K) must be much larger to obtain accu- 
rate results. If the metal is close packed, as is frequently 
the case, this entails a rapid increase in the number of 
elements in the density matrix, and a slowing cLown of 
variational techniques; this is discussed elsewherecj. 

One approach is to reduce the range of the density 
matrix in metals by introducing an artificial electronic 
temperature, which broadens the Fermi occupation func- 
tion, and localises the density matrix. The drawback is 
a potentially large electronic entropy contribution to the 
energy; however, there is a scheme for extpapolating the 
results-back to zero electronic temperaturecj. It has been 
shownc3 that even with such a scheme the variational 
density matrix method described abo ve is i nefficient; the 
hybrid nmthod described in Section III A improves the 
efficiencytll. More efficient are recursion-based methods, 
such as the Feopi Operator Expansion£3 or the Bond Or- 
der PotentiaEJ'Ea. However, these have the disadvantage 
of not being strictly variational. 

A further possibility which has emerged recently is to 
use a series of nested HilbertpSpaces to find the exact zero 
temperature density matrixH. This method has the ad- 
vantage that no approximation is being made to remove 
the electronic entropy contributions, and allows high pre- 
cision calculations on metals. It is, however, still in de- 
velopment, and practical demonstrations have yet to be 
published. 



The present choice of basis for represent ing t he lo- 
calised orbitals has been described in Section [I A . How- 



ever, there are two good reasons for changing this basis 
somewhat. First, there is the problem of ripples in the 
energy caused by the numerical integration; this is due 
to a lack of translational invariance with respect to the 
integration grid. If the rapidly varying parts of the lo- 
calised orbitals (i.e. the core regions, which do not alter 
greatly during a calculation) could be represented in a 
more efficient form, then the integrals could be performed 
analytically, significantly reducing ripples. Second, there 
would be great value in being able to perform ah initio 
tight binding calculations with the code (or even to have 
certain parts of a supercell treated with full DFT, while 
others were treated with ah initio tight binding). For 
these reasons, we are planning to move to a mixed basis 
(seen recently in conventional pseudopotential calcula- 
tions) which combines pseudo-atomicuorbitals (possibly 
of the form of Sankey and NiklewskiEa) with a coarser 
blip grid. This will suppress the ripples with respect to 
the integration grid, and give the flexibility to model dif- 
ferent parts of the system with appropriate accuracy. 



V. CONCLUSIONS 



The recent developments which have been outlined 
above show that the future of 0{N) ah initio techniques 
is extremely bright. We have shown that the localisation 
of the density matrix gives the framework within which 
these methods can be constructed, and have given de- 
tails of the implementation of one such code. Conquest. 
The examples presented show that this method is practi- 
cal, and that the spatial cutoffs required for accuracy are 
small enough to make the calculations perfectly feasible. 
The search for the ground state has been addressed, and 
methods for making this search more robust and efficient 
have been discussed. The remaining tasks for the Con- 
quest code have been described, and the way forward for 
all of them is clear. The most important conclusion to 
draw from this body of work is that 0{N) DFT methods 
actually work. Indeed, these methods are being demon- 
strated in practical calculations. Our group is working 
towards practical application of the Conquest code to 
large-scale problems. 
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APPENDIX A: 



THE PSEUDOPOTENTIAL 
METHOD 



As Conquest uses the standard pseudopotential 
method, it is important to recall the salient facts 
within this, formalism; there are many excellent reviews 
elsewhereuj which give more detail. 

In the pseudopotential method, only the valence elec- 
trons are considered, and their interaction with the ionic 
cores is replaced by a pseudopotential, v{r). This means 
that in fact the solutions of the Schrodinger equation are 
pseudo-wavefunctions, and the charge density, n(r), is 
the pseudo-density of the valence electrons. The energy 
arising from the interaction between the cores and the 
valence electrons is given by: 



E,,^ j dvV{v)p{v), 



(Al) 



where V{y) is found as the sum over the ionic pseudopo- 
tentials: 



y(r)-^<|r-R, 



(A2) 



with Ri the core positions. In general, the pseudopoten- 
tial is non-local, that is i'(r,r'). 

In conventional pseudopotential ah initio techniques, 
the wavefunctions are often expanded in terms of plane 
waves: 



V-i = ^ CjGexp(zG • r). 



(A3) 



G 



where G is a reciprocal lattice vector. The total energy 
is then minimised with respect to the coefficients qg- 
Frequently, particularly in metals, variable occupation 
numbers are allowed, so that the wave function i/;, has 
an occupation fi. This gives a total energy function: 



£;tot=Stot({c.G},{/^}). 



(A4) 



lowered indices to distinguish between vectors and ma- 
trices which transform differently; this has been intro- 
duced to electronic structure calculations by Ballentine 
and KolaiO, who also describe the general formalism. 

The eigenfunctions of the Hamiltonian are expanded 
in a set of non-orthogonal, localised, atom-centred or- 
bitals {0c((r)}, where a runs over all orbitals on all atoms. 
These orbitals span a Hilbert space V, and define an over- 
lap matrix which is given by Sap = {4>a \ 4>f3)- The in- 



verse overlap matrix, (S 



_n^Q/3 



is defined by the relation: 



= 1 


if a = 7 


= 


if a ^ 7 



(Bl) 



A dual space, V* , exists which is spanned by the or- 
bitals I 0") = Y.0 {S'^Y^ I (j)p). These two sets of 
vectors are bi-orthogonal, that is ((/)" | 0^) = 60. It is im- 
portant to note that contraction can only be carried out 
over indices which are opposed, while addition can only 
be carried out between tensors for which indices agree. 
The vectors in the original space are called covariant, 
while the vectors in the dual space are contravariant. 



A convenient choice for the metric of V is S* ^ , so that 
J^ai^'a I {S~^Y I 4>is) = S2i- An equivalent choice of 
metric for V* is Sap- These operate to change a vector 
in one space to the vector in another; thus a proper scalar 
product within a space can be formed by incorporation 
of the metric. A covariant operator can be represented 
as an outer product: 



i = ^ I cj,a)A 



aPi 



(B2) 



Q,/3 



Then the scalar product of two covariant operators is 
written: 



(A,B)= Y. {<l>5\(t>0)AP'^{<j>a\ci>,)B'^', 

oi, 13,^,5 

^TriA'fSBS]. (B3) 



APPENDIX B: METRICS AND MINIMISATION 
IN NON-ORTHOGONAL BASIS SETS 

When working with a non-orthogonal basis, care must 
be taken with notation. It is common to use raised and 



It is important to note that the product of a covariant 
and a contravariant pair (such as H and p) is invariant 
with basis set. Traditionally, the Hamiltonian is taken as 
covariant and the density matrix as contravariant. 



1. Variation of L 

The innermost part of CONQUEST consists of the min- 
imisation of the energy with respect to the elements of 
the matrix L*".'^ with fixed support functions. This 
is achieved by performing hne minimisations along di- 
rections supplied by the conjugate gradients algorithm 
(with, on occasion, a correction for maintaining the elec- 
tron wumber constant). As pointed out recently by White 
et alxB, the gradient of the energy with respect to L*"J^, 
Vr2, (which is used as the search direction in the minimi- 
sation) is actually covariant, while L*"-'^ is contravari- 
ant; this means that the gradient must be transformed 
to a contravariant matrix before being combined with 
the density matrix: {S~^)'Vfl{S~^). But there is more 
to the problem of minimisation than just this; conjugate 
gradients assumes the following relations: 



7» = 






-V/(P.+i) 



i'l+l ■ 5t+l 

Si ' Si 



(B4) 



where gi is the gradient of the function at step i and h^ is 
the search direction (which is conjugate to the previous 
search directions) at step i. Clearly in the formation of 
7i , care must be taken to ensure that the product is ten- 
sorially correct, otherwise the choice of the new search 
direction will be wrong, so the correct formula for 7^ is: 



g,+i-(^^i)g,+i(^-i) 
g, • (5-i)g,(5-i) ' 



(B5) 



(This is discussed at more length in Section 2.7 of Ref.Ej, 
where the bi-conjugate gradient method is described. 
This degree of complexity is not needed here.) Simi- 
lar care both with the correct nature of gradients and 
with the search directions must be taken when varying 
the localised orbitals. 
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FIG. 1. Convergence to the true ground state energy of Si for Conquest as the cut-off apphed p is increased: (a) Rr. 
increasing with Rl effectively infinite; (b) Rl increasing with Rjog held fixed. See text for details. 




FIG. 2. The McWeeny purification function, /(A) = 3A^ - 2A^ 
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FIG. 3. The difference between the cohesive energy at a given iteration and the final cohesive energy for the LNV stage of 
the hybrid scheme (solid lines) and the pure LNV scheme (dashed lines) for diamond structure carbon. Results are shown for 
different cut-off radii: 3 hops (circles), 5 hops (squares) and 7 hops (diamonds), whore a hop is a Hamiltonian nearest neighbour 
distance. 




FIG. 4. A function with elongated surfaces of constant / 
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FIG. 5. Convergence to the ground state of the Si crystal calculated with (solid lines) and without (dashed lines) precondi- 
tioning. Results are for region radii of 2.72 A (circles), 3.40 A (squares) and 3.80 A (diamonds). 
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